Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use gRPC for queries #10997

Closed
wants to merge 1 commit into from
Closed

feat: use gRPC for queries #10997

wants to merge 1 commit into from

Conversation

troian
Copy link
Contributor

@troian troian commented Jan 21, 2022

fixes #10996

  • changelog
  • API breaking changes changelog
  • tests

Signed-off-by: Artur Troian troian.ap@gmail.com

@troian
Copy link
Contributor Author

troian commented Jan 21, 2022

@AmauryM in continue to our discord discussion. this is rather PoC which seems to work

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could we get a changlog entry and possibly a test. Could be good to have parallel running queries to test that functionality.

client/cmd.go Show resolved Hide resolved
client/grpc_query.go Outdated Show resolved Hide resolved
client/context.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
client/context.go Outdated Show resolved Hide resolved
@troian troian force-pushed the issue-10996 branch 3 times, most recently from 6d7302a to 4389f2d Compare January 22, 2022 01:14
@troian troian force-pushed the issue-10996 branch 3 times, most recently from 73f3eb4 to d41ebfb Compare January 23, 2022 15:58
@github-actions github-actions bot added the C:Rosetta Issues and PR related to Rosetta label Jan 23, 2022
@troian
Copy link
Contributor Author

troian commented Jan 23, 2022

some updates to the PR.

  • updated the client so it tries to use gRPC by default and all tests seem to use it.
  • if --offline flag is set --grpc flag is ignored and connect phase omitted
  • rosetta test updated to use grpc

questions.

  1. --grpc defaults to localhost:9090 which means it tries to connect GRPC unless --grpc= is explicitly set to empty.
    GRPC server is on by default so should not be an issue.
  2. Should grpc default host value be based on --node?. For example --node=tcp://example.com:26657 but grpc still points to localhost which definitely will create confusion. Instead gRPC host could be derived from the node unless directly set by user. Thoughts?
rpcURI, _ := flagSet.GetString(flags.FlagNode)
if clientCtx.Client == nil || flagSet.Changed(flags.FlagNode) {
	if rpcURI != "" {
		clientCtx = clientCtx.WithNodeURI(rpcURI)

		client, err := NewClientFromNode(rpcURI)
		if err != nil {
			return clientCtx, err
		}

		clientCtx = clientCtx.WithClient(client)
	}
}

isOffline, _ := flagSet.GetBool(flags.FlagOffline)
if !isOffline && clientCtx.ClientConn == nil {
	grpcURI, _ := flagSet.GetString(flags.FlagGRPC)
	if grpcURI != "" {
		if !flagSet.Changed(flags.FlagGRPC) && rpcURI != "" {
			if uri, err := url.Parse(rpcURI); err == nil {
				guri, _ := url.Parse(grpcURI)
				grpcURI = uri.Hostname() + ":" + guri.Port()
			}
		}
		if gcl, err := grpc.DialContext(ctx, grpcURI, grpc.WithInsecure(), grpc.WithReturnConnectionError()); err == nil {
			clientCtx = clientCtx.WithClientConn(gcl)
		}
	}
}

@tac0turtle
Copy link
Member

--grpc defaults to localhost:9090 which means it tries to connect GRPC unless --grpc= is explicitly set to empty.
GRPC server is on by default so should not be an issue.

to replicate current behaviour I think it should be something like default to rpc, unless --grpc is passed. Defaulting to localhost here is fine.

2. Should grpc default host value be based on --node?. For example --node=tcp://example.com:26657 but grpc still points to localhost which definitely will create confusion. Instead gRPC host could be derived from the node unless directly set by user. Thoughts?

Making the user explicitly set a grpc endpoint there I think is best. There are many endpoint that expose the RPC but not grpc and vice versa.

@troian
Copy link
Contributor Author

troian commented Jan 24, 2022

do-not-merge
i've have found some incertitudes in client behaviour due to grpc.Dial as well as client.Context being passed by value. Need to give it a bit more think

--grpc flag conflicts with grpc object in generated app.toml.
looks like there are two options:

  1. rename flag to something like --grpc-node
  2. rename object in app.toml to grpc-server

I tend to option 2 as the flag on the client side will be often used by people so they have to type less. As GRPC is not yet widely adopted in cosmos ecosystem, renaming server block to grpc-server or similar should not create issues.

@amaury1093 amaury1093 marked this pull request as draft January 24, 2022 12:47
@amaury1093 amaury1093 added the S: DO NOT MERGE Status: DO NOT MERGE label Jan 24, 2022
@amaury1093
Copy link
Contributor

i personally prefer option 1, because it's keep backwards compatibility. Renaming app.toml fields requires its own migration path, which is more effort.

Also, on the client side, we already have the --node flag, so --grpc-node makes sense to me.

@troian
Copy link
Contributor Author

troian commented Jan 26, 2022

we have a weird race in tests happening when gRPC is on. that's with cosmos v0.44.5. Any thoughts?
https://pastebin.com/3egjGUu1

it happens on the network cleanup

@tac0turtle
Copy link
Member

@troian let me know how I can help push this over the finish line. Secret wants to test this to see if it fixes something else.

fixes #10996

Signed-off-by: Artur Troian <troian.ap@gmail.com>
Copy link
Contributor

@amaury1093 amaury1093 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @troian, do you think this is ready for review?

For me it looks good, maybe #10997 (comment) is the only comment I have left

@troian
Copy link
Contributor Author

troian commented Jan 30, 2022

@AmauryM this is the race we're seeing. considering we're on v0.44.5 I suppose there can be things (like rw mutex) added somewhere after v0.44.5. thoughts?

the app(BaseApp).checkState.ctx is being accessed same time within separate goroutines.
it is one of the many race failures we see but they all end up with same source

WARNING: DATA RACE
Write at 0x00c002a6cb50 by goroutine 125:
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/cosmos/cosmos-sdk/baseapp/abci.go:187 +0x9b4
  github.com/ovrclk/akash/app.(*AkashApp).BeginBlock()
      <autogenerated>:1 +0x90
  github.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:280 +0x120
  github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:81 +0x8c
  github.com/tendermint/tendermint/state.execBlockOnProxyApp()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/state/execution.go:307 +0x480
  github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/state/execution.go:140 +0x180
  github.com/tendermint/tendermint/consensus.(*State).finalizeCommit()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:1635 +0xda8
  github.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:1546 +0x468
  github.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:1481 +0x11c
  github.com/tendermint/tendermint/consensus.(*State).enterCommit()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:1519 +0x1264
  github.com/tendermint/tendermint/consensus.(*State).addVote()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:2132 +0x11fc
  github.com/tendermint/tendermint/consensus.(*State).tryAddVote()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:1930 +0x48
  github.com/tendermint/tendermint/consensus.(*State).handleMsg()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:838 +0x51c
  github.com/tendermint/tendermint/consensus.(*State).receiveRoutine()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/tendermint/tendermint/consensus/state.go:782 +0x5a0

Previous read at 0x00c002a6cb50 by goroutine 145:
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/cosmos/cosmos-sdk/baseapp/abci.go:648 +0x3c4
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func1()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/cosmos/cosmos-sdk/baseapp/grpcserver.go:50 +0x2c0
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x78
  github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:33 +0xb8
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x78
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0xf4
  github.com/cosmos/cosmos-sdk/x/bank/types._Query_AllBalances_Handler()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/cosmos/cosmos-sdk/x/bank/types/query.pb.go:943 +0x1bc
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).RegisterGRPCServer.func2()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/github.com/cosmos/cosmos-sdk/baseapp/grpcserver.go:80 +0x124
  google.golang.org/grpc.(*Server).processUnaryRPC()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/google.golang.org/grpc/server.go:1210 +0x11b4
  google.golang.org/grpc.(*Server).handleStream()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/google.golang.org/grpc/server.go:1533 +0xed8
  google.golang.org/grpc.(*Server).serveStreams.func1.2()
      /Users/amr/go/src/github.com/ovrclk/akash/vendor/google.golang.org/grpc/server.go:871 +0xa4

also running real testnet with clients using grpc reproduces the issue
https://pastebin.com/P0BgTyH1

@fdymylja fdymylja mentioned this pull request Feb 1, 2022
19 tasks
@troian
Copy link
Contributor Author

troian commented Feb 2, 2022

@AmauryM thoughts on the last comment?

@amaury1093
Copy link
Contributor

amaury1093 commented Feb 2, 2022

Oh wow, it seems like you found a server-side bug that's in production.

It was maybe just that no-one ever used the gRPC endpoint directly, that's why we never saw this issue.

@troian Would you like to add a mutex on app.checkState? I can also do this tomorrow (maybe it's cleaner to do the mutex in another PR)

@troian
Copy link
Contributor Author

troian commented Feb 2, 2022

yeah, i have pr in my drafts. still doing some tests to make sure it work

@tac0turtle
Copy link
Member

the app(BaseApp).checkState.ctx is being accessed same time within separate goroutines.
it is one of the many race failures we see but they all end up with same source

this is with this branch being used? I know people running queries in parallel and this is the first time seeing this

@amaury1093
Copy link
Contributor

this is with this branch being used? I know people running queries in parallel and this is the first time seeing this

I believe 0.44.5 as #10997 (comment). Is there a possibility they all used client.Context to run parallel queries, which under the hood pinged TM rpc abci_query instead of the gRPC server?

@tac0turtle
Copy link
Member

this is with this branch being used? I know people running queries in parallel and this is the first time seeing this

I believe 0.44.5 as #10997 (comment). Is there a possibility they all used client.Context to run parallel queries, which under the hood pinged TM rpc abci_query instead of the gRPC server?

I think this is the case, which explains why teams like terra and secret were running into non parallel queries when trying to do it

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 21, 2022
grpcURI, _ := flagSet.GetString(flags.FlagGRPCNode)

if grpcURI != "" {
grpcKeepAlive, _ := flagSet.GetDuration(flags.FlagGRPCKeepAlive)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the error.

@@ -147,6 +150,23 @@ func ReadPersistentCommandFlags(clientCtx Context, flagSet *pflag.FlagSet) (Cont
}
}

if clientCtx.ClientConn == nil || flagSet.Changed(flags.FlagGRPCNode) {
grpcURI, _ := flagSet.GetString(flags.FlagGRPCNode)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the error.

Comment on lines +164 to +166
if err == nil {
clientCtx = clientCtx.WithClientConn(gcl)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer returning a zero-value Context when the error is non-nil.

@@ -91,6 +94,8 @@ var LineBreak = &cobra.Command{Run: func(*cobra.Command, []string) {}}
// AddQueryFlagsToCmd adds common flags to a module query command.
func AddQueryFlagsToCmd(cmd *cobra.Command) {
cmd.Flags().String(FlagNode, "tcp://localhost:26657", "<host>:<port> to Tendermint RPC interface for this chain")
cmd.Flags().String(FlagGRPCNode, "", "<host>:<port> to GRPC interface for this chain")
cmd.Flags().Duration(FlagGRPCKeepAlive, 20*time.Second, "set GRPC keepalive. defaults to 20s")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to re-iterate the default value in the help text, as it is printed anyway.

@fedekunze
Copy link
Collaborator

fedekunze commented Apr 4, 2022

we reopening this as it's needed for Ethermint

@fedekunze fedekunze reopened this Apr 4, 2022
@tac0turtle
Copy link
Member

LFG

@troian
Copy link
Contributor Author

troian commented Apr 5, 2022

@marbar3778 this issue needs to be addressed first #11102

@crypto-facs
Copy link

@marbar3778 this issue needs to be addressed first #11102

Correct.. I patched the 0.45.2 release with this PR to test this on Ethermint and I am currently seeing the following error

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x49f513f]

goroutine 1 [running]:
github.com/cosmos/cosmos-sdk/client.Context.Invoke({{0x0, 0x0, 0x0}, {0x5fe5e50, 0xc0004c4a60}, {0xc0034bdf90, 0x10}, {0x5fab470, 0xc0015ba030}, {0x5fd7a98, ...}, ...}, ...)
	github.com/cosmos/cosmos-sdk@v0.45.2/client/grpc_query.go:84 +0x47f
github.com/tharsis/ethermint/x/evm/types.(*queryClient).Params(0xc003a8a0b0, {0x5fa1a08, 0xc000136008}, 0x5f6e180, {0x0, 0x0, 0x0})
	github.com/tharsis/ethermint/x/evm/types/query.pb.go:1377 +0xce
github.com/tharsis/ethermint/rpc/ethereum/backend.(*EVMBackend).ChainConfig(0xc001037c00)
	github.com/tharsis/ethermint/rpc/ethereum/backend/backend.go:948 +0x63
github.com/tharsis/ethermint/rpc/ethereum/namespaces/eth.NewPublicAPI({_, _}, {{0x0, 0x0, 0x0}, {0x5fe5e50, 0xc0004c4a60}, {0xc0034bdf90, 0x10}, {0x5fab470, ...}, ...}, ...)
	github.com/tharsis/ethermint/rpc/ethereum/namespaces/eth/api.go:88 +0x215
github.com/tharsis/ethermint/rpc.GetRPCAPIs(_, {{0x0, 0x0, 0x0}, {0x5fe5e50, 0xc0004c4a60}, {0xc0034bdf90, 0x10}, {0x5fab470, 0xc0015ba030}, ...}, ...)
	github.com/tharsis/ethermint/rpc/apis.go:54 +0x3af
github.com/tharsis/ethermint/server.StartJSONRPC(_, {{0x0, 0x0, 0x0}, {0x5fe5e50, 0xc0004c4a60}, {0xc0034bdf90, 0x10}, {0x5fab470, 0xc0015ba030}, ...}, ...)
	github.com/tharsis/ethermint/server/json_rpc.go:40 +0x24e
github.com/tharsis/ethermint/server.startInProcess(_, {{0x0, 0x0, 0x0}, {0x5fe5e50, 0xc0004c4a60}, {0x59ed61d, 0x7}, {0x5fab470, 0xc0015ba030}, ...}, ...)
	github.com/tharsis/ethermint/server/start.go:429 +0x192c
github.com/tharsis/ethermint/server.StartCmd.func2(0xc000568c80, {0xc0013646c0, 0x0, 0x9})
	github.com/tharsis/ethermint/server/start.go:117 +0x228
github.com/spf13/cobra.(*Command).execute(0xc000568c80, {0xc001364630, 0x9, 0x9})
	github.com/spf13/cobra@v1.4.0/command.go:856 +0x60e
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001ea000)
	github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.4.0/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.4.0/command.go:895
github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x6ee62c8, {0xc00139ab40, 0x20})
	github.com/cosmos/cosmos-sdk@v0.45.2/server/cmd/execute.go:36 +0x1eb
main.main()
	github.com/tharsis/ethermint/cmd/ethermintd/main.go:20 +0x38

@troian
Copy link
Contributor Author

troian commented Apr 5, 2022

@crypto-facs highly possible. last time i worked on the pr more than a month ago.
we still want to get it done, just don't have enough capacity

@fedekunze
Copy link
Collaborator

@marbar3778 @crypto-facs i know that oKEX forked the SDK and added this functionality. We should look into their implementation

@alexanderbez
Copy link
Contributor

I think the solution here is very simple as laid out by @AmauryM's proposal in the main issue. I think we should close this as it's pretty out of date and stale. I'll be happy to take on the issue and PR if that's OK @troian?

@troian
Copy link
Contributor Author

troian commented May 2, 2022

@alexanderbez sounds good. closing

@troian troian closed this May 2, 2022
@tac0turtle tac0turtle deleted the issue-10996 branch February 16, 2023 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:CLI C:Rosetta Issues and PR related to Rosetta C:x/group S: DO NOT MERGE Status: DO NOT MERGE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

client.Context gRPC queries should use the gRPC port instead of abci_query
7 participants